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Abstract —Evaluating the performance of researchers and 
measuring the impact of papers written by scientists is the 
main objective of citation analysis. Various indices and metrics 
have been proposed for this. In this paper, we propose a new 
citation index Citex, which gives normalized scores to authors 
and papers to determine their rankings. To the best of our 
knowledge, this is the first citation index which simultaneously 
assigns scores to both authors and papers. Using these scores, 
we can get an objective measure of the reputation of an author 
and the impact of a paper. 

We model this problem as an iterative computation on a 
publication graph, whose vertices are authors and papers, and 
whose edges indicate which author has written which paper. 
We prove that this iterative computation converges in the limit, 
by using a powerful theorem from linear algebra. We run 
this algorithm on several examples, and find that the author 
and paper scores match closely with what is suggested by our 
intuition. The algorithm is theoretically sound and runs very 
fast in practice. We compare this index with several existing 
metrics and find that Citex gives far more accurate scores 
compared to the traditional metrics. 

Keywords: Citation Analysis, Graph Algorithms, 
Matrix Computations, Eigenvalues and Eigenvec¬ 
tors, Information Retrieval. 

I. Introduction 

In today’s world, numerous papers are written by authors 
in many journals and conferences. It is difficult for people 
to judge the quality and impact of an author or a paper, 
even if they are experts, just by reading a few papers. Thus, 
measuring the relative importance of authors and papers pub¬ 
lished in scientific conferences and journals is very important. 
More importantly, there is a need for an index giving accurate 
results, which can be computed easily for a large collection 
of authors and papers. 

Many metrics are available to evaluate the importance of 
journals like Impact factor m, Immediacy index, citation 
page rank m, and Y-factor ED- There are many metrics 
which give scores to authors. Most famous of these are 
Hirsch’s h-index IT3l . Individual h-index ffl, Egghe’s g- 
index 133, and Zhang’s e-index EQ. However, till now, 
the only way to evaluate the impact of a paper, is to count 
the number of citations. It has been observed that surveys 
and review articles receive more citations than high quality 


original research papers. Self citations also increase the 
number of citations of a paper. 

This is why we propose a new metric for evaluating papers 
for the first time. This new citation index ClTEX, gives 
normalized scores to authors and papers to determine their 
rankings. Using these scores, we can get an objective measure 
of the reputation of an author and the impact of a paper. 
Paper scores are calculated not solely based on the number of 
citations. Apart from giving scores to papers, we give scores 
to authors. The author scores and paper scores reinforce each 
other. Thus, an influential author will increase the score of the 
paper (s)he writes. An author’s score increases if (s)he writes 
a good paper. Since the author score increases the score of 
a paper written by the author, it will be a general tendency 
to write a paper with an influential author. To prevent this, 
we assign scores to authors in such a way that the score of a 
paper gets uniformly divided by the number of people who 
have co-authored the paper. This basic model can be very 
easily extended to weighted distribution of scores, where a 
first author who has the highest contribution receives more 
weightage than an author who has less contribution. 

Existing literature rate an author based on the number of 
papers that he/she has written, the total number of citations 
received, average number of citations per paper etc. In 
Section [II] we discuss each of these metrics and state their 
advantages and disadvantages. None of the existing tech¬ 
niques take into account how the paper scores of an author 
influence the author’s standing in the academic community, 
because the paper score is calculated solely based on the 
number of citations. Our ClTEX index is inspired by the 
ideas of PageRank ED, ED, 0 and HITS ED algorithms 
for ranking web pages. In this scheme, paper scores are 
updated depending on author scores and author scores are 
updated based on paper scores. This is often referred to as 
the Principle of Repeated Improvement 0. We prove that the 
scores asymptotically converge in the limit when the number 
of iterations is large. In practice, the scores converge within 
a few iterations. 

The main idea is to consider the authors and papers as a 
network with a disjoint set of nodes, with the set of authors 
and the set of papers as vertices in the two sets. An edge 
exists between an author and a paper, if the author has written 


the corresponding paper. Apart from these, there is a citation 
subgraph, consisting of papers as the vertices. A directed 
edge exists from node i to node j, if paper i cites paper j. 
At each step, a paper score gets uniformly divided amongst 
all its authors. The paper score is the sum of the scores of 
the authors who have written the paper and the scores due 
to citation. 


The currently available scores like h-index count the num¬ 
ber of citations, but not the impact of these papers which 
have cited this paper. This can be easily manipulated by 
self-citations. Moreover, since many indices like h-index and 
number of citations are integers, it is difficult to distinguish 
between two authors or papers with the same value of the 
index. Our index has a higher discriminatory power, since it 
is a real number between 0 and 1. It is highly unlikely that 
two authors or papers will have the same value for ClTEX. 
There is also a non-uniformity across various disciplines. In 
sciences, there is a general tendency to work in large groups. 
These papers also receive large number of citations compared 
to papers in computer science. Since the score is divided 
uniformly among multiple authors, each author score will be 
reduced, thus affecting paper scores as well. Our index also 
gives credit to authors who write single-author papers. This 
should not deter people to do collaborative research. 


The problem can be fine-tuned in a number of ways. We 
can take into consideration the recommendation of authors by 
other authors and consider weighted distribution. The prob¬ 
lem also has a number of ramifications. A similar index can 
be designed for product-customer recommendation system, 
where customers can recommend each other depending upon 
the reputations (similar to citation of papers), and a customer 
can recommend a product (similar to writing papers). The 
difference here is that, the score of a product is not uniformly 
divided amongst customers, and information cascades have 
to be taken into account when calculating the product scores. 
Other interdependent networks, which reinforce each other, 
can be treated similarly. 


The rest of the paper is organized as follows. In section 
|IH we discuss the related work on citation analysis, define 
the metrics and compare them. In section [III] we define the 
problem and propose the model to analyze it. We give an 
informal description of the algorithm and present the rules 
to iteratively compute the author and paper scores in section 


procedure and prove that it converges to the eigenvectors of 
certain matrices. In section [VT| we execute the algorithm on 
some illustrative examples, and show that the author scores 
and paper scores give good indication of their importance, as 
can be seen from the underlying graph structure. In section 
|VII| we discuss some extensions of the basic algorithm and 
future direction to work on. We conclude the paper in section 
IVIIII with some future directions to work on. 


IV In section [VJ we mathematically analyze the iterative 


II. Related work 

A. Different metrics used in citation analysis 

Previously, there have been several attempts to measure 
the impact of authors and papers. We list here some of them 
along with their definition. 

1) Number of papers ( N p ): Total number of papers 
written by an author. 

2) Number of citations ( N c ): Total number of citations 
for all papers written by an author. 

3) Average number of citations per paper: Ratio of total 
number of citations and total number of papers, i.e., 

This is sometimes also called the impact factor. 

4) Average number of citations per author: For each 
paper, its citation count is divided by the number of 
authors for that paper to give the normalized citation 
count for the paper. The normalized citation counts 
are then summed across all papers to give the average 
number of citations per author. 

5) Average number of papers per author: For each 
paper, the inverse of the number of authors gives the 
normalized author count for the paper. The normalized 
author counts are then summed across all papers to give 
the average number of papers per author. 

6) Average number of authors per paper: The sum of 
the author counts across all papers, divided by the total 
number of papers. 

7) //-index Hi 31 : An author has index h, if h of his N p 
papers have at least h citations each, and the rest of 
the N p — h papers have no more than h citations each. 

8) g-index ED: Given a set of articles ranked in decreas¬ 
ing order of the number of citations that they received, 
the g-index is the (unique) largest number such that the 
top g articles together received at least g 2 citations. 

9) e-index ED: It is the square root of surplus citations in 
the h -set beyond the theoretical minimum (h 2 ) required 
to obtain a //-index of h. It is useful for highly cited 
scientists and for comparing those with the same h- 
index but different citation patterns. 

10) Number of significant papers: Total number of papers 
with more than c citations for some integer c. 

11) Number of citations to the most cited papers: Total 
number of citations to the k most cited papers for some 
integer k. 

12) Eigenfactor: The Eigenfactor score O is a rating of 
the total importance of a scientific journal. Journals are 
rated according to the number of incoming citations, 
with citations from highly ranked journals weighted 
to make a larger contribution to the Eigenfactor than 
those from poorly ranked journals El- As a measure of 
importance, the Eigenfactor score scales with the total 
impact of a journal. Journals generating higher impact 
to the field have larger Eigenfactor scores. However, it 
is not clear whether Eigenfactor gives better estimate 
than raw citation count. [|8] 





TABLE I 

Comparison between different citation metrics 


Citation Metric 

Advantage 

Disadvantage 

Number of papers 

Measure of productivity. 

Importance of papers not considered. 

Number of citations 

Measures impact of an author. 

A few highly cited papers increase the total. 

Survey and review articles are cited more than original research 
papers. 

Favors established authors. 

Average number of citations per 
paper 

Allows comparison of scientists of different 
ages. 

Rewards low productivity. 

Average number of citations per 
author 

Measures impact of an author. 

Difficult to distinguish between authors whose average is same, 
but citation patterns are different. 

Average number of papers per au¬ 
thor 

Measures average productivity 

Does not measure impact of papers. 

Average number of authors per pa¬ 
per 

Measures collaboration between authors. 

Does not consider importance of authors and papers. 

h-index 

Measures both the quality and quantity of 
scientific output. 

Does not account for the number of authors of a paper. 

Different fields with different number of citations will have 
different h-index. 

Can be manipulated through self-citations. 

g-index 

Gives more weight to highly-cited articles. 

Unlike the h-index, the g-index saturates whenever the average 
number of citations for all published papers exceeds the total 
number of published papers. 

e-index 

Differentiates between scientists with iden¬ 
tical h-indices but different citations. 

Can’t be used independently. Must be used together with the h- 
index. 

Number of papers with at least c 
citations 

Measures the broad and sustained impact of 
an author. 

Difficult to find the right value of c. Different values of c favors 
different authors. 

Number of citations to the k most 
cited papers 

Identifies the most influential authors. 

Not a single number, so difficult to compare. Different values of 
k favors different authors. 

Eigenfactor 

Takes into account impact of the citing pa¬ 
pers in addition to the number of citations. 

Does not give author scores. Importance of authors have to 
inferred indirectly from the papers (s)he has written. 


B. Comparison between different citation metrics 

In this section, we compare the different citation indices 
and state their advantages and disadvantages. The comparison 
is presented in Table [I] 

C. Comparison with similar works 

There are a number of previous attempts to rank authors 
and papers based on importance. The SimRank algorithm 
by Jeh and Widom M gives a measure of the similarity 
between two objects based on their relationships with other 
objects. Their basic idea is that two objects are similar if they 
are related to similar objects. Note that this only measures 
the similarity of two objects, not their relative ranking, so 
this is different from what we are trying to do. Zhou et. 
al. 1221 proposed a method for co-ranking authors and their 
publications using several networks associated with authors 
and papers. Although there is some similarity between our 
algorithm and their approach, there are fundamental dif¬ 
ferences between the two. Their co-ranking framework is 
based on coupling two random walks that separately rank 
authors and documents using the PageRank algorithm. Our 
algorithm is designed from scratch and does not use the 
PageRank algorithm. Moreover, our algorithm is much 
simpler and the computations required is also far lesser than 
what is required in their method. Walker et. al. l20l gave 
a new algorithm called CiteRank. The ranking of papers 
is based on a network traffic model, which uses a variation 
of the PageRank algorithm. A paper is selected randomly 
from the set of all papers with a probability that decays 


exponentially with the age of the paper. Chen et. al. 0 
uses a PageRank based algorithm to assess the relative 
importance of all publications. Their goal is to find some 
exceptional papers or “gems” that are universally familiar to 
physicists. Sun and Giles lfl9l propose a popularity weighted 
ranking algorithm for academic digital libraries. They use the 
popularity of a publication venue and compare their method 
with the PageRank algorithm, citation counts and the HITS 
algorithm. 

D. Some structures in citation analysis 

• Collaboration graph: This is a graph associated with 
the authors. The nodes of the graph are the authors. 
There is an undirected edge between two nodes, if the 
corresponding authors have written a paper together. 

• Citation graph: This is a graph associated with the 
papers. The nodes of the graph are the papers. There 
is a directed edge from a paper to another paper, if the 
first paper has cited the second paper. 

• Publication graph: This is a graph relating the authors 
with the papers. The nodes of the graph are the authors 
and the papers. There is an undirected edge between two 
nodes, if the author has written the paper. 

III. Problem definition and model 

We have a set of m authors A = {ai,...,a m } and a 
set of n papers V = { pi,... ,p n }. We represent this by a 
publication graph Gp = ( Vp,Ep ), whose vertices are the 
set of authors and papers, i.e., Vp = A LJ V. There is an 

















undirected edge between author a, and paper p 3 , if author a, 
has written paper p 3 . Note that this is a symmetric relation, so 
the edges are undirected. Since there are only edges between 
authors and papers, the publication graph is a bipartite graph. 
Associated with this, there is an m x n publication matrix 
M, whose rows and columns are ai,, a m and pi,... ,p n 
respectively, and whose ( i,j) th entry rriij = 1, if and only 
if author a, has written paper pj. 

Moreover, there is a citation graph Gc = {Vc, Ec) 
associated with the papers, whose vertices are the set of 
papers, i.e., Vc = V. There is a directed edge from paper p 3 
to paper p/ r , if paper pj has cited paper pf.. Note that this is 
an asymmetric relation, so the edges are directed. Associated 
with this, there is an n x n citation matrix C, whose both 
rows and columns are pi,... ,p n , and whose (j. k) th entry 
Cjk = 1, if and only if paper p 3 has cited paper pj.. Note 
that the citation graph can’t have any directed cycle. This is 
because a paper can only cite a previously published paper, 
so they are totally ordered in time. This also means that if 
the papers are numbered in decreasing order of time (newer 
first), the resulting citation matrix will be upper-triangular. 
An example of a publication graph and a citation graph is 
given in Figure [I] 



Fig. 1. The publication and citation graphs for Example 1 showing authors, 
papers and citations. 

The following sets are important for further development. 

1) For an author a, PAPERS(a) is defined as the 
set of papers written by author a. In other words, 
PAPERS {a) = {per : {a, p) £ E P }. 

2) For a paper p, AUTHORS(p) is defined as the set 
of authors who have written paper p. In other words, 
AUTHORS{p) = {a e A : (a,p) £ E P j. 

3) For a paper p, CITE{p) is defined as the set of papers 
who have cited paper p. In other words, CITE(p ) = 
{<? € V : (q,p) S E c }. 

4) For a paper p, REF{p) is defined as the set of papers 
which have been given as reference (cited) by paper p. 
In other words, REF(p) = {q £ V : (p,q) £ Ec}- 

Our goal is to assign scores to authors and papers using 


the structure of the publication and citation graphs, so that 
important authors and papers get higher scores. 

IV. Description of the Citex index 

A. Informal description of the algorithm 

In this section, we give an overview of our proposed 
algorithm. The algorithm maintains a set of author scores and 
paper scores, which are initially set to 1. This initial choice 
of scores is arbitrary, and the scores can be set to any nonzero 
value. Then we update the scores considering the relationship 
between authors and papers (who has authored which paper) 
and relationship between papers (which paper has cited which 
paper). This critically uses the publication graph and the 
citation graph. We use the Principle of Repeated Improvement 
0 to iteratively compute the new scores based on the 
previous scores. More specifically, the author scores for the 
next iteration is computed from the paper scores for the 
current iteration. The paper scores for the next iteration is 
computed from the author scores and the paper scores for the 
current iteration. In every iteration, we normalize the scores 
by dividing them by the sum of the individual scores, so that 
each of them lies between 0 and 1, and they add up to 1. 
We continue to do this till the author scores and the paper 
scores converge or a specified number of iterations have been 
completed. The Principle of Repeated Improvement states 
that each improvement of author scores will lead to a further 
improvement of paper scores, and vice versa. The final author 
scores and paper scores are the measure of importance of the 
authors and the papers. The higher the score is, the higher is 
the impact of an author and a paper. 

B. Computing author and paper scores 

For each author a*, we have an author score ( a-score ) Xu 
and for each paper p 3 . we have a paper score ( p-score ) y 3 . 
We represent the set of author scores as a column vector 
x = (xi ,..., Xm) T and the set of paper scores as a column 
vector y = (yi,..., y n ) T ■ We initialize all author and paper 
scores to one, i.e., x = y = 1. Then, we iteratively update 
the a-scores and p-scores using the following rules. 

1) For each paper p 3 , its adjusted p-score y 3 is given by 
the p-score y 3 divided by the number of authors who 
have written the paper. In other words, y 3 = %, for 
j = 1 ,...,n, where k = \AUTHORS(p 3 )\ is the 
number of co-authors of the paper p 3 . 

2) For each author a,;, set his a-score x t to be the sum 

of the adjusted p-scores of all the papers that he has 
authored. In other words, x r = ]> ~2 3&P APERS(i ) f° r 

i = 1,..., m. 

3) For each paper p 3 , set its p-score y 3 to be the sum of the 
a-scores of all the authors who have co-authored the 
paper p 3 . In other words, y 3 = E teA UTHORS(j ) Xi ’ 
for j = 1,..., n. 

4) For each paper p 3 , add to its p-score y 3 , the sum of 
the p-scores of all the papers who have cited the paper 
Pj- In other words, y 3 = y 3 + J2keCiTE(j) Vk, for 
j = n. 







We normalize the scores by dividing the author (paper) 
scores by the sum of the author (paper) scores, so that each 
score lies between 0 and 1 , and the sum of the scores is 1 . 

V. Mathematical analysis 

A. Analysis of author scores and paper scores 

We observe that the rule yj = Y^ieAUTHORS(j) Xi can 
be rewritten as yj = tnijXi , since m,j = 1 if 

and only if i £ AUTHORS(j). Consider the matrix- 
vector equation y •<— Al T x. The j th row of this equation 
is y.j = rn ij x i- Hence, this matrix-vector equation 

succinctly encodes all n scalar equations for j = 1 ,,n. 

The corresponding equation for x is similar, but a little 
more involved. The equation Xi = Y^jePAPERS(i) Vi can 
be written as Xi = Y^j=l m ijVj > si nce rn ij = 1 if and only 
if j € PAPERS(i). Let y = (j/i,... ,y n ) T . Consider the 
matrix-vector equation x c- My. The i th row of this equa¬ 
tion is Xi = £" = i rriijyj. Hence, this matrix-vector equation 
succinctly encodes all m scalar equations for i = 1 ,m. 
Further note that, y 3 = \ Auth v Sr S ( P]) \ = Hence > 

= £"=i ( e£% ) y i = S"=i w ijVj, where = 
* s ''he weight associated with the paper pj. Now, x 
can fee written as x •<— Wy, where W is the m x n weight 
matrix whose ( i,j) th entry is Wij. 

The equation yj = yj + J2keCiTE(j) Vk can he rewritten 
as yj = yj + c kjVk, since c kj = 1 if and only if k £ 

CITE(j). This can be written as the matrix-vector equation 
y <— (I + C T ) y, where I is the n x n identity matrix. 

Let the initial author vector and paper vector be x'°^ and 
y' 0/ respectively. If we start with the equation x •<— Wy. the 


successive iterations proceed as below. 

X< 1 ) =W /y<0) j (!) 

y<r> = M t x (1) = M T Wy {0) , (2) 

y H> = (7 + C T )yW = (J + C T )M T Wy {0) . (3) 

Similarly, if we start with the equation y A/ r x, the 
successive iterations proceed as below. 

y<r> = M t x (0) , (4) 

y (1> =(I + C T ) y (1> = (I + C T )M r x<°>, (5) 

x<r) = Wy w = W(I + C t )M t x (0) . (6) 

Proceeding similarly, at the fc-th iteration the author and 
paper vectors are given by, 

x « = [W{I + C T )M T ] k yS 0) , (7) 

y « = [(/ + C T )M T W] k y <0> . (8) 


B. Proof of convergence of author scores and paper scores 
In this section, we will prove the following theorem. 

Theorem 1 . The sequences x^ and y^, k = 0,1,2,... 
converge to the limits x* and y* respectively. Moreover, x* 
is the principal eigenvector of the matrix W ( I+C T )AI T and 
y* is the principal eigenvector of the matrix (J +C T )AI T W. 


Further, both x* and y* are non-negative and non-zero 
vectors. 

Proof: From the above discussion we have, x' fc ^ = 
pkyfo) and y(k) = Qk y<0) ; where P = W(I + C T )M T 
and Q = (I + C T )M T W. Note that P is an m x to square 

matrix, whereas Q is an n x n square matrix. Moreover, 
x <fc+i) = pk+ i x <o) = P . P k x {0) = Px {k) _ If the author 

score x^ converges to the vector x* in the limit when 
k —t oo, then this vector should satisfy Px* = x*. This 
means that x' is an eigenvector of P, with the correspond¬ 
ing eigenvalue being 1. Similarly, if the paper score y' k/ 
converges to the vector y* in the limit when k —> oo, then 
y* must be an eigenvector of Q, with the corresponding 
eigenvalue being 1 . 

To prove that a non-negative eigenvalue and a non-negative 
eigenvector exists, we use the following theorem from linear 
algebra. 

Theorem 2 (Perron-Frobenius Theorem). nnn , EB , 
EH Let A = (ciij) be an nxn non-negative matrix, meaning 
that otij > 0,V*,j : 1 < i,j < n. Then the following 
statements hold. 

1 ) A has a real eigenvalue c > 0 such that c > \d\ for 
all other eigenvalues c!. 

2) There is an eigenvector v with non-negative real 
components corresponding to the largest eigenvalue 
c : Av = cv,Vi >0,1 < i < n, and v is unique 
up to multiplication by a constant. 

3) If the largest eigenvalue c is equal to 1, then for any 
starting vector x^ 7 ^ 0 with non-negative components, 
the sequence of vectors A k x t '°' > converge to a vector in 
the direction of v as k -A 00 . 

Thus, by the Perron-Frobenius theorem, the author and 
paper scores both converge to unique non-negative vectors 
x* and y*, after repeated applications of the update rules. 
These two vectors are the limiting values of the author and 
paper scores. Moreover, none of the vectors x* and y* can 
be the zero vector. At least one of their components must 
be non-zero, because the initial vectors x^ J/ and y^ 0 ' are the 
all -1 vectors, and at each iteration the vectors are normalized. 
So the sum of their components add up to 1. ■ 

C. Time-complexity of each iteration 

We have to multiply some matrices as can be seen from 
equations 0 and 0 - Computing the product of a to x n 
matrix and a nxp matrix requires 0(mnp) time. Computing 
the matrices W{I + C T )M T and (J + C T )M T W takes 
0(mn(m + n)) time each. Hence, each iteration can be done 
in 0(mn(m + n)) time. 

VI. Experimental analysis 

We compute the author and paper scores for some graphs 
and show that they match with our intuition. 






M = 


A. Example 1 

For the graph in Figure [T] the author and paper vectors, 
publication matrix and citation matrix are given below. Note 
that for this example, m = 4, n = 5. 


1 0 0 
1 0 1 
0 1 0 
0 1 0 
0 0 0 


,C = 


1 1 
0 1 
0 0 
0 1 


0 

0 

0 

0 


x= [ 1 , 1 , 1 , 1 ], y= [ 1 , 1 , 1 , 1 , 1 ], 


M = 


1 

0 

0 

1 


0 1 0 
1 0 1 
1 1 0 
0 1 1 


,C = 


0 1 
0 0 
0 0 
0 0 
0 0 


1 0 1 
1 1 0 
0 0 1 
0 0 1 
0 0 0 


The author scores after 10 iterations are given below. These 
scores converge (up to 3 decimal places) as there is no change 
between two successive iterations. 


x = [0.259,0.132,0.289,0.320], 
y = [0.082,0.141,0.264,0.123,0.390], 


The author scores after 10 iterations are given below. These 
scores converge (up to 3 decimal places) as there is no change 
between two successive iterations. 

x = [0.106,0.590,0.152,0.152,0.000], 
y = [0.212,0.304,0.484,0.000], 

Intuitively it is clear that author a 2 and paper p% should 
get the highest scores. Author <22 has written two papers p\ 
and P 3 . In turn, p :i has been cited by papers p\, p'> and p,\. 
On the other hand, p 4 gets the lowest paper score 0, since 
it is not cited by any paper, as gets the lowest author score 
since it has only written paper P 4 , which has a score 0. These 
scores matches with our intuition. 


Intuitively it is clear that author a, \ and paper p- : should 
get the highest scores. Author 04 has written 4 papers 
Pi,P 3 ,P 4 ,P 5 and some of them have high paper scores. 
Similarly, p$ has been written by 3 authors 01 , 03,04 and 
cited by 3 papers pi,P 3 ,P 4 , some of which have high scores. 
02 gets the lowest score as it has written only 2 papers P 2 ,P 4 , 
none of which have high paper scores. Similarly, p\ gets the 
lowest score since it has no citations, although it has two 
authors 01 , 04 . 



Fig. 2. The publication and citation graphs for Example 2 showing authors, 
papers and citations. 


B. Example 2 

For the graph in Figure [2] the author and paper vectors, 
publication matrix and citation matrix are given below. Note 
that for this example, m = 5, n = 4. 

x = [ 1 , 1 , 1 , 1 , 1 ], y = [ 1 , 1 , 1 , 1 ], 


VII. Extensions 

A. Weights on edges of the publication graph 

In Section El we assumed that all authors contribute 
equally to a paper. However, this is not true in practice. 
Different co-authors have different contribution to a paper. 
We can easily incorporate this feature in our ClTEX index, 
by slightly modifying the publication matrix M. In Section 
|V| M was a 0-1 matrix. For the weighted version, we 

construct a matrix N, where edge weight fly denotes the 

contribution of author i in writing paper j. An author having 
higher contribution has higher weight, compared to an author 
having lesser contribution. The new matrix N might not 
be a 0-1 matrix. W' is the matrix corresponding to W. 
Hence, Xi = V" , where 

w ij = n * s t ^ le we ^^ lt associated with the paper pp 

Now, x can be written asxf- W'y, where W' is the weight 
matrix whose ( i,j) th entry is wE. 

Now the author and paper vectors at the fc-th iteration can 
be written as, 

X W = [W\I E C T )N T ] k yS°\ 

y W = [(/ + C T )N T W , ] k y < '°K 

B. Reputation of authors 

Our ClTEX index can be modified to include the reputation 
of authors. Each author can rank other authors who he/she 
believes has done original, ground breaking work. We define 
the author reputation graph Gr = (Vr, Er), similar to the 
citation graph. The vertices are the set of authors, i.e., Vr = 
A. Thus, the number of vertices in this graph is m. A directed 
edge from node i to node j of weight ry 7 exists, if author 
i has rated author j with a score r,; 7 . The author reputation 
matrix R is defined similarly. For an author a, REP(a ) is 
defined as the set of authors who have ranked author a. In 
other words, REP(a) = {b G A : (6, a) € Er}. 

On incorporating the reputation matrix, another rule is 
added to the list. For each author Oj, add to its a-score x.i. 















the sum of the a-scores of all the authors who have rated 
author a, . In other words, x, = Xi + J2keREP(i) x k = 
Xi + X)*Li r kiXk, for i = 1 ,..., m. 

Let the initial author vector and paper vector be and 
y^ (J/ respectively. If we start with the equation x -t— Wy, the 
successive iterations proceed as below. 


• The analytical power of eigenvector-based methods is 
not yet fully understood. It would be interesting to 
pursue this question in the context of the algorithm 
presented here. Considering random graph models that 
contain enough structure to capture certain global prop¬ 
erties of the model is a promising direction. 


x < 1 > = Wy^\ 

x (1> =(I + f? T )x (1) = (/ + R T )Wy {0) 
y {1) = M t x (1) = M T (I + R T )Wy {0) , 
y (1) = (/ + C T ) y (1) = (/ + C t )M t (I + R T )Wy {0) . 

Proceeding similarly, at the fc-th iteration the author and 
paper vectors are given by, 

x (fc) = [(/ + R t )W(I + C T )M T } k yS°\ 
y < fc > = [(/ + C t )M t {I + R T )W] k y {0) . 

C. Evaluation of journals and conferences 

One important question is how to measure the quality of 
scientific journals and conferences. If we have the author 
scores and paper scores of all authors and papers published 
in a journal/conference, we can use the average author score 
and the average paper score as a metric for determining the 
quality of the journal/conference. 


VIII. Conclusion and future directions 


In this paper, we have proposed a new citation metric 
ClTEX to judge the quality of authors and papers in scientific 
publications. ClTEX assigns scores to authors and papers, 
with higher scores indicating more importance, by analyzing 
the link stmctures in the publication graph and the citation 
graph. We also considered some extensions to the basic 
scheme. Here are some future directions to work on. 


In a real-world scenario, authors and papers will be 
added over time. Dynamically modifying the scores 
from the current scores in an incremental fashion is a 
challenging problem. 

Applying this framework to the setting of customer 
recommendation of products will be an interesting idea. 
Here the nodes of the graph are customers and products. 
A customer can give a score to a product, which is like 
the weighted version of the publication graph. 

This is an example of an interdependent network , where 
there are two graphs - the collaboration/reputation graph 
and the citation graph. In addition, there is a publication 
graph which records the cross-edges between the nodes 
in the two graphs. Extending ClTEX to other interdepen¬ 
dent networks will be an interesting direction to think 
about. 

Using further parameters such as time of publication 
and age of authors, in addition to the link structure will 
require further thoughts. 

Can this technique be extended to directly assign scores 
to journals and conferences, rather than doing it indi¬ 
rectly, as in section VII-C ? 
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